# EMBEDDED SYSTEMS

CHAPTER -2

HARDWARE DESIGN ISSUES

# 2. Hardware Design Issues [4 Hrs]

- 2.1 Combination Logic
- 2.2 Sequential Logic
- 2.3 Custom Single-Purpose Processor Design
- 2.4 Optimizing Custom Single-Purpose Processors

#### Introduction

#### Processor

- Digital circuit that performs a computation tasks
- Controller and datapath
- General-purpose: variety of computation tasks
- Single-purpose: one particular computation task
- Custom single-purpose: nonstandard task
- A custom single-purpose processor may be
  - Fast, small, low power
  - But, high NRE, longer time-tomarket, less flexible



#### **CMOS** transistor on silicon

#### Transistor

- The basic electrical component in digital systems
- Acts as an on/off switch
- Voltage at "gate" controls whether current flows from source to drain
- this "gate" with a logic gate







•Three modes based on the magnitude of  $V_{\rm gs}$ : accumulation, depletion and inversion.





#### n-MOS Enhancement Transistor

ullet With  $V_{\rm ds}$  non-zero, the channel becomes smaller closer to the drain.

•When  $V_{\rm ds}$  <=  $V_{\rm gs}$  -  $V_{\rm t}$  (e.g.  $V_{\rm ds}$  = 3V,  $V_{\rm gs}$  = 5V and  $V_{\rm t}$  = 1V), the channel reaches the drain (since  $V_{\rm gd}$  >  $V_{\rm t}$ ).

•This is termed linear , resistive or nonsaturated region. I  $_{\rm ds}$  is a function of both V  $_{\rm gs}$  and V  $_{\rm ds}$  .

4



#### n-MOS Enhancement Transistor

•When  $V_{\rm ds} > V_{\rm gs}$  -  $V_{\rm t}$  (e.g.  $V_{\rm ds} = 5V$ ,  $V_{\rm gs} = 5V$  and  $V_{\rm t} = 1V$ ), the channel is pinched off close to the drain (since  $V_{\rm gd} < V_{\rm t}$ ).

•This is termed saturated region.  $I_{ds}$  is a function of  $V_{gs}$ , almost independent of  $V_{ds}$ .



### **CMOS** transistor implementations

- Complementary Metal
   Oxide Semiconductor
- refer to logic levels
  - Typically 0 is 0V, 1 is 5V
- Two basic CMOS types
  - nMOS conducts if gate=1
  - pMOS conducts if gate=0
  - Hence "complementary"
- Basic gates
  - Inverter, NAND, NOR





#### **Basic logic gates**





# Combinational logic design

#### A) Problem description

y is 1 if a is to 1, or b and c are
1. z is 1 if b or c is to 1, but not
both, or if all are 1.

#### D) Minimized output equations

| y b | c<br>00 | 01          | 11 | 10 |
|-----|---------|-------------|----|----|
| 0   | 0       | 0           | 1  | 0  |
| 1   | 1       | 1           | 1  | 1  |
|     | ·       | <del></del> |    |    |



$$z = ab + b'c + bc'$$

#### B) Truth table

|   | Inputs | Outputs |   |   |  |
|---|--------|---------|---|---|--|
| а | b      | С       | У | Z |  |
| 0 | 0      | 0       | 0 | 0 |  |
| 0 | 0      | 1       | 0 | 1 |  |
| 0 | 1      | 0       | 0 | 1 |  |
| 0 | 1      | 1       | 1 | 0 |  |
| 1 | 0      | 0       | 1 | 0 |  |
| 1 | 0      | 1       | 1 | 1 |  |
| 1 | 1      | 0       | 1 | 1 |  |
| 1 | 1      | 1       | 1 | 1 |  |

#### C) Output equations



# **Combinational components**

| SO n-bit, m x 1 Multiplexor S(log m) n      | I(log n -1) I0                                             | A B n n-bit Adder n n carry sum                               | A B n n-bit Comparator less equal greater                          | n bit, m function ALU S(log m)            |  |
|---------------------------------------------|------------------------------------------------------------|---------------------------------------------------------------|--------------------------------------------------------------------|-------------------------------------------|--|
| O = 10 if S=000 11 if S=001 1(m-1) if S=111 | O0 =1 if I=000<br>O1 =1 if I=001<br><br>O(n-1) =1 if I=111 | sum = A+B<br>(first n bits)<br>carry = (n+1)'th<br>bit of A+B | less = 1 if A <b<br>equal =1 if A=B<br/>greater=1 if A&gt;B</b<br> | O = A op B<br>op determined<br>by S.      |  |
|                                             | With enable input e → all O's are 0 if e=0                 | With carry-in input Ci→ sum = A + B + Ci                      |                                                                    | May have status outputs carry, zero, etc. |  |

## Sequential components



#### Sequential logic design

A) Problem Description
You want to construct a clock
divider. Slow down your preexisting clock so that you
output a 1 for every four
clock cycles





|    | Inputs |   | Outputs |    |   |  |
|----|--------|---|---------|----|---|--|
| Q1 | Q0     | а | 11      | 10 | х |  |
| 0  | 0      | 0 | 0       | 0  | 0 |  |
| 0  | 0      | 1 | 0       | 1  | U |  |
| 0  | 1      | 0 | 0       | 1  | 0 |  |
| 0  | 1      | 1 | 1       | 0  | U |  |
| 1  | 0      | 0 | 1       | 0  | 0 |  |
| 1  | 0      | 1 | 1       | 1  | U |  |
| 1  | 1      | 0 | 1       | 1  | 1 |  |
| 1  | 1      | 1 | 0       | 0  | 1 |  |

D) State Table (Moore-type)

- Given this implementation model
  - Sequential logic design quickly reduces to combinational logic design

#### Sequential logic design (cont.)





#### **Custom single-purpose processor basic model**



controller and datapath



a view inside the controller and datapath

## **Example: Greatest Common Divisor**

- First create algorithm
- Convert algorithm to "complex" state machine
  - Known as FSMD:
     finite-state machine
     with datapath
  - Can use templates to perform such conversion



(c) state diagram

## State diagram templates

Assignment statement

 $\mathbf{a} = \mathbf{b}$ 

next statement



Loop statement

while (cond) {
 loop-body statements
}
next statement



Branch statement

if (c1) cl stmts else if c2

c2 stmts

else

other stmts next statement



#### Creating the datapath

- Create a register for any declared variable
- Create a functional unit for each arithmetic operation
- Connect the ports, registers and functional units
  - Based on reads and writes
  - Use multiplexors for multiple sources
- Create unique identifier
  - for each datapath component control input and output



#### Creating the controller's FSM





- Same structure as FSMD
- Replace complex actions/conditions with datapath configurations



## Splitting into a controller and datapath



# Controller state table for the Greatest Common Divisor (GCD) example

| Inputs |    |    |    |             | Outputs |      |    |    |    |    |       |       |      |      |      |
|--------|----|----|----|-------------|---------|------|----|----|----|----|-------|-------|------|------|------|
| Q3     | Q2 | Q1 | Q0 | x_neq<br>_y | x_lt_y  | go_i | 13 | 12 | I1 | 10 | x_sel | y_sel | x_ld | y_ld | d_ld |
| 0      | 0  | 0  | 0  | *           | *       | *    | 0  | 0  | 0  | 1  | Х     | Х     | 0    | 0    | 0    |
| 0      | 0  | 0  | 1  | *           | *       | 0    | 0  | 0  | 1  | 0  | Х     | Х     | 0    | 0    | 0    |
| 0      | 0  | 0  | 1  | *           | *       | 1    | 0  | 0  | 1  | 1  | Х     | Х     | 0    | 0    | 0    |
| 0      | 0  | 1  | 0  | *           | *       | *    | 0  | 0  | 0  | 1  | Х     | Х     | 0    | 0    | 0    |
| 0      | 0  | 1  | 1  | *           | *       | *    | 0  | 1  | 0  | 0  | 0     | Х     | 1    | 0    | 0    |
| 0      | 1  | 0  | 0  | *           | *       | *    | 0  | 1  | 0  | 1  | Х     | 0     | 0    | 1    | 0    |
| 0      | 1  | 0  | 1  | 0           | *       | *    | 1  | 0  | 1  | 1  | Х     | Х     | 0    | 0    | 0    |
| 0      | 1  | 0  | 1  | 1           | *       | *    | 0  | 1  | 1  | 0  | Х     | Х     | 0    | 0    | 0    |
| 0      | 1  | 1  | 0  | *           | 0       | *    | 1  | 0  | 0  | 0  | Х     | Х     | 0    | 0    | 0    |
| 0      | 1  | 1  | 0  | *           | 1       | *    | 0  | 1  | 1  | 1  | Х     | Х     | 0    | 0    | 0    |
| 0      | 1  | 1  | 1  | *           | *       | *    | 1  | 0  | 0  | 1  | Х     | 1     | 0    | 1    | 0    |
| 1      | 0  | 0  | 0  | *           | *       | *    | 1  | 0  | 0  | 1  | 1     | Х     | 1    | 0    | 0    |
| 1      | 0  | 0  | 1  | *           | *       | *    | 1  | 0  | 1  | 0  | Х     | Х     | 0    | 0    | 0    |
| 1      | 0  | 1  | 0  | *           | *       | *    | 0  | 1  | 0  | 1  | Х     | Х     | 0    | 0    | 0    |
| 1      | 0  | 1  | 1  | *           | *       | *    | 1  | 1  | 0  | 0  | Х     | Х     | 0    | 0    | 1    |
| 1      | 1  | 0  | 0  | *           | *       | *    | 0  | 0  | 0  | 0  | Х     | Х     | 0    | 0    | 0    |
| 1      | 1  | 0  | 1  | *           | *       | *    | 0  | 0  | 0  | 0  | Х     | Х     | 0    | 0    | 0    |
| 1      | 1  | 1  | 0  | *           | *       | *    | 0  | 0  | 0  | 0  | Х     | Х     | 0    | 0    | 0    |
| 1      | 1  | 1  | 1  | *           | *       | *    | 0  | 0  | 0  | 0  | Х     | Х     | 0    | 0    | 0    |

# Completing the GCD custom singlepurpose processor design

- We finished the datapath
- We have a state table for the next state and control logic
  - All that's left is combinational logic design
- This is not an optimized design, but we see the basic steps



a view inside the controller and datapath

# RT-level custom single-purpose processor design

- We often start with a state machine
  - Rather than algorithm
  - Cycle timing often too central to functionality
- Example
  - Bus bridge that converts4-bit bus to 8-bit bus
  - Start with FSMD
  - Known as registertransfer (RT) level
  - Exercise: complete the design



#### RT-level custom single-purpose processor design (cont')



## Optimizing single-purpose processors

- Optimization is the task of making design metric values the best possible
- Optimization opportunities
  - -original program
  - -FSMD
  - -datapath
  - -FSM

### Optimizing the original program

- Analyze program attributes and look for areas of possible improvement
  - –number of computations
  - -size of variable
  - -time and space complexity
  - —operations used
    - multiplication and division very expensive

#### Optimizing the original program (cont')

```
original program
 0: int x, y;
 1: while (1) {
 2: while (!go i);
 3: x = x i;
 4: y = y_i;
 5: while (x != y) {
                                     replace the subtraction
       if (x < y)
                                    operation(s) with modulo
 7:
        y = y - x;
                                   operation in order to speed
       else
                                           up program
 8:
        x = x - y;
 9: d o = x;
```

GCD(42, 8) - 9 iterations to complete the loop x and y values evaluated as follows: (42, 8), (34, 8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2).

```
optimized program
  0: int x, y, r;
  1: while (1) {
  2: while (!go i);
      // x must be the larger number
  3: if (x_i >= y_i) {
        x=x_i;
        y=y_i;
  6: else {
  7:
        x=y_i;
        y=x i;
  9: while (y != 0) {
        r = x \% y;
        x = y;
 12:
 13: d \circ = x;
GCD(42,8) - 3 iterations to complete the loop
x and y values evaluated as follows: (42, 8), (8,2),
```

(2,0)

## **Optimizing the FSMD**

- Areas of possible improvements
  - merge states
    - states with constants on transitions can be eliminated, transition taken is already known
    - states with independent operations can be merged
  - -separate states
    - states which require complex operations
       (a\*b\*c\*d) can be broken into smaller states to reduce hardware size
  - scheduling

## Optimizing the FSMD (cont.)



original FSMD

eliminate state 1 – transitions have constant values

*merge state 2 and state 2J* – no loop operation in between them

merge state 3 and state 4 – assignment operations are independent of one another

merge state 5 and state 6 – transitions from state 6 can be done in state 5

eliminate state 5J and 6J – transitions from each state can be done from state 7 and state 8, respectively

*eliminate state 1-J* – transition from state 1-J can be done directly from state 9

#### optimized FSMD



## Optimizing the datapath

- Sharing of functional units
  - one-to-one mapping, as done previously, is not necessary
  - if same operation occurs in different states,
     they can share a single functional unit
- Multi-functional units
  - ALUs support a variety of operations, it can be shared among operations occurring in different states

## Optimizing the FSM

#### State encoding

- task of assigning a unique bit pattern to each state in an FSM
- size of state register and combinational logic vary
- can be treated as an ordering problem
- State minimization
  - task of merging equivalent states into a single state
    - state equivalent if for all possible input combinations the two states generate the same outputs and transitions to the next same state